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Abstract 

Sparse learning is an important topic in many areas such as machine 
learning, statistical estimation, signal processing, etc. Recently, there 
emerges a growing interest on structured sparse learning. In this paper 
we focus on the ^-analysis optimization problem for structured sparse 
learning (0 < q < 1). Compared to previous work, we establish weaker 
conditions for exact recovery in noiseless case and a tighter non-asymptotic 
upper bound of estimate error in noisy case. We further prove that the 
nonconvex ^-analysis optimization can do recovery with a lower sample 
complexity and in a wider range of cosparsity than its convex counterpart. 
In addition, we develop an iteratively reweighted method to solve the op¬ 
timization problem under the variational framework. Theoretical analysis 
shows that our method is capable of pursuing a local minima close to 
the global minima. Also, empirical results of preliminary computational 
experiments illustrate that our nonconvex method outperforms both its 
convex counterpart and other state-of-the-art methods. 


1 Introduction 

The sparse learning problem is widely studied in many areas including machine 
learning, statistical estimate, compressed sensing, image processing and signal 
processing, etc. Typically, this problem can be defined as the following linear 
model 

y = x/3 + w, (i) 

where f3 £ is the vector of regression coefficients, X £ R mX!i is a design 
matrix with possibly far fewer rows than columns, w £ R m is a noise vector, 
and y £ R m is the noisy observation. As is well known, learning with the 


1 


or 


l\ norm (con vex relaxation of t he Iq norm), such as lasso Tibshirani . 1996] 
basis pursuit [Chen et ah . 1998], encourages sparse estim ate of 0. Recently, this 
appro ach has been extended to define structured sparsity. Tibshirani and Taylor 


2011] proposed the generalized lasso 


P z 


( 2 ) 


which assumes that the parameter (3 is sparse under a linear transformation 
D £ M. nxd . An equiv alent const rained version is the ^i-analysis minimization 
proposed bv lCandes et al.1 j201C)| . i.e., 


min||D/3||i 

P 


s.t. ||y-X/3|| 2 < e, 


(3) 


where D is called the analysis operator. In contrast to the lasso and basis 
pursuit in D = I, the generalized lasso and £i-analysis minimization make a 
structured sparsity assumption so that it can explore structures on the pa- 
rame ter. They include sever al well-known models a s special cases , e.g., fused 


lasso [Tibshirani et al 
Lasso 


Sharpnack et al 


2005t. generalized fused lasso [Viallon et al. . 2014 1. edge 


2012 ], total va riation (TV) mini mization Rudinet_al 


1992j ] , trend filtering Kim et al. L 2009 [, the LLT mode l Lvsaker et al. , 2003j ] . 


the inf-convolution model Chambolle and Liond . Il997 1 . etc. Additionally, the 
generalized lasso and (^-analysis minimization have been demonstrated to be ef¬ 
fective and even superior over the standard sparse learning in many application 
problems. 


The seminal work of [Fan and Li 2001] showed that the nonconvex sparse 
learning holds better properties than the convex one. Motivated by that, this 
paper investigates the following £ 9 -analysis minimization (0 < q < 1) problem 


min||D/3||« 


P 


S.t. 


|y - X/3|| 2 < e. 


(4) 


We consider both theoretical and computational aspects. We summary the 
major contributions as follows: 

• We establish weaker conditions for exact recovery in noiseless case and 
a tighter non-asymptotic upper bound of estimate error in noisy case. 
Particularly, we provide a necessary and sufficient condition guarantee¬ 
ing exact recovery via the f^-analysis minimization. To the best of our 
knowledge, our work is the first study in this issue. 

• We show the advantage of the nonconvex £ g -analysis minimization (q < 
1) over its convex counterpart. Specifically, the nonconvex ^-analysis 
minimization can do recovery with a lower sample complexity (on the 
order of qk\og(n/k)) and in a wider range of cosparsity. 


• We resort to an iteratively reweighted method to solve the f g -analysis 
minimization problem. Furthermore, we prove that our method is capable 
to obtain a local minima close to the global minima. 
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The numerical results are consistent with the theoretical analysis. For ex¬ 
ample, the nonconvex f g -analysis minimization indeed can do recovery with 
smaller sample size and in a wider range of cosparsity than the convex method. 
The numerical results also show that our iteratively reweig hted metho d outper¬ 


forms the other state-o f-the-art methods such as NESTA [Becker et al.l . 12011 


split Bregman metho d [Cai et al.l . l2009a{ . and iteratively reweighted £\ method 


Candes et all [2007] for the i \- analysis minimiza tion problem and the greedy 


analysis pursuit (GAP) method Nam et al. . l2011 | for the £o _ analysis minimiza¬ 
tion problem (q —> 0 in (gj)) . 


1.1 Related Work 


Candes et al. 2010l ] studied the ti-analysis minimization problem in the setting 
that the observation is contaminated with stochastic noise and the analysis 
vector D/3 is approximately sparse. They provided a £2 norm estimate error 
bounded by Coe+Cifc _1 / 2 ||D/3— (D/3)(A:) 11 1 under the assumption that X obeys 
the D-RIP condition 82 k < 0.08 or 87 k < 0.6 and D is a Parseval tight framed. 


Nam et al.l [201 1] studied the £i-analysis minimization problem in the setting 


that there is no noise and the analysis vector D/3 is sparse. They showed that 
a null space pr operty with sign pattern is necessary and s ufficient to guarante e 
exact recovery. Liu et al. 2012 ] improved the analysis in Candes et all 2Q10i. 
They established an estimate error bound similar to the one in Candes et all 
2010i ] for the general frame case. And for the Parseval frame case, they provided 
a weaker D-RIP condition 877 . < 0.2. 


Tibshirani and Tavlon [201 lj] proposed the ge neralized lasso and d eveloped a 


LARS-like algorithm pursuing its solution path. [Vaiter et al 


20131 conducted 


Liu et al] 2013 


a robustness analysis of the generalized lasso against noise, 
derived an estimate error bound for the generalized lasso under the assumption 
that the condition number of D is bounded. Specifi cally, a £2 norm estimate 
error bounded by (7A+ ||(X T X)~ 1 X T w|| 2 is provided. Needell and Ward 2013 ] 
investigated the total variation minimization. They proved that for an image 
(3 G R NxN , the TV minimization can stably recover it with estimate error less 
than Clog(^r)(e + ||D/3 — (T)(3)(k)\\i/Vk) when the sampling matrix satisfies 
the RIP of order k. 

So far, all the re l ated wprks discussed above consider convex optimization 
problem. Aldroubi et al.l [ 2012 ] first studied the nonconvex /^-analysis mini¬ 
mization problem j4[). They established estimate error bound using the null 
space property and restricted isometry property respectively. For the Parseval 
frame case, they showed that the D-RIP condition 87 k < eL( 2 / 3 ) 2 /i ~ 2 su ^~ 


ficient to guarantee stable recovery. Li and Linl [2014] showed that the D-RIP 
condition 82 k < 0.5 is sufficient to guarantee the success of £ g -analysis min¬ 
imization. In this paper, we significantly improve the analysis of /'g-analysis 


1 A set of vectors {d/,} is a frame of if there exist constants 0 < A < B < 00 such that 
V/3 € R d , A\ |/3||| < | \Df3\ || < B\ |/3|||, where {d fe } are the columns of D T . When A = B = 1, 

the columns of D T form a Parseval tight frame and D V D = I. 
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minimization. For example, we provide a weaker D-RIP condition 82 k < 
Additionally, we show the advantage of the nonconvex £ g -analysis minimization 
over its convex counterpart. 


2 Preliminaries 


Throughout this paper, N denotes the natural number. |_'J denotes the rounding 
down operator. The i-th entry of a vector (3 is denoted by /3,. The best k-term 
approximation of a vector (3 £ is obtained by setting its d — k insignificant 
components to zero and denoted by (3(k). The £ q norm of a vector (3 £ 
is defined as ||/3|| g = {J2i=i lAI 9 ) 1 ^ 9 El for 0 < q < 00 . When q tends to 
zero, ||/3||® is the Iq norm ||/3||o used to measure the sparsity of (3. crk{/3) q = 
inf, 


ze{zeR d :| 

with the L 


z|| 0 <fc} 

norm. 


||/3 — z\\ q denotes the best fc-term approximation error of (3 
The i- th row of a matrix D is denoted by D* . cr max ( D) 
and a m i n (D ) denote the maximal and minimal nonzero singular value of D, 
respectively. Let n = , and Null(X) denote the null space of X. 

Now we introduce some concepts related to the £ g -analysis minimization 
problem (JU). The nu mber o f zeros in the analysis vector D/3 is refered to as 


cosparsity [Nam et all 1201 lj . and defined as l := n — ||D/3|| 0 . Such a vector (3 


is said to be Z-cosparse. The support of a vector /3 is the collection of indices 
of nonzeros in the vector, denoted by T := {i : 0i A 0}. T c denotes the 
complement of T. The indices of zeros in the analysis vector D/3 is defined as 
the cosupport of /3, and denoted by A := {j : (Dy,/3) = 0}. The submatrix 
D t is constructed by replacing the rows of D corresponding to T c by zero rows. 
Denote Dt/ 3 = (D/3)r. Based on these concepts, we can see that a Z-cosparse 
vector /3 lies in the subspace Wa := {/3 : Da/3 = 0, |A| = 1} = NuII(Da). Here 
|A| is the cardinality of A. 

In our analysis below, we use the notion of A-RIP Blumensath and Davies . 

l2008i . 

Definition 1 (A-restricted isometry property) A matrix $ £ R mxd obeys 


the A-restricted isometry property with constant S .4 over any subset A £ 
8 a is the smallest quantity satisfying 


if 


(l-WIMl! < ||*v||| < (l+$A)|h 


for all v £ A. 

Note that RIP Candes and Taol . 2004 1 


Candes et al. , 2010i | and D-RIP 


_ __,, D-RIP __ 

Girves et all 120131 ] are special instances of the A-RIP with different choices 
of the set A. For example, when choosing A = {Dv : v £ R d ,||v||o < k} 
and A = {v : v £ R d ,DAV = 0 , |A| > Z}, the corresponding A-restricted 
isometries are D-RIP and D-RIP, respectively. It has been verified that any 
random matrix <f> holds the A-restricted isometry property with overwhelming 
probability provided that t he number of samples depends logarithmically on the 


number of subspaces in A Blumensath and Daviesl . 20081 ] . 


| q for 0 < q < 1 is not a norm, but d( u, v) = ||u — v||g for u, v G 1 


is a metric. 
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3 Main Results 


In this section, we present our main theoretical results pertaining to the ability 
of £ g -analysis minimization to estimate (approximately) cosparse vectors with 
and without noise. 


3.1 Exact Recovery in Noiseless Case 


A well-known necessary and sufficie nt conditio n g uara nteeing the success of basis 
pursuit is the null space property [Cohen et all l2009l| . Naturally, we d e fine a 
null space property adapted to D (D-NSPq) of order k [Aldroubi et al 
for the t? 9 -analysis minimization. That is, 


2012 [ 


V« € Null(X)/ { 0 }, V|T| < k,\\D T v\\l < \\D T cv\\ q q . (5) 

Theorem 1 Let (3 £ with cosupport A, 11D/3| |o = k, and y = X(3. Then (3 
is the unique minimizer of the £ q -analysis minimization © with e = 0 if and 
only if X satisfies the D-NSPq |3]) relative to A c . 


Letting the set A (|A| = l) vary, the following result is a corollary of Theorem 

□ 

Corollary 1 Given a matrix X £ R mXli and y = Xf3, the £ q -analysis mini¬ 
mization © with e = 0 recovers every l-cosparse vector (3 £ R d as a unique 
minimizer if and only if X satisfies the D-NSPq |5|) of order n — l. 

This corollary establishes a necessary and sufficient condition for exact re¬ 
covery of all Z-cosparse vectors via the ^-analysis minimization. It also implies 
that for every y = X/3 with Z-cosparse /3, the ^-analysis minimization actually 
solves the /^-analysis minimization when the D-NSPq of order n—l holds. Based 
on the D-NSPq, the following corollary shows that the nonconvex f'q-analysis 
minimization is not worse than its convex counterpart. 


Corollary 2 For 0 < qi < <72 < 1, the sufficient condition for exact recovery 
via the £ q2 -analysis minimization is also sufficient for exact recovery via the 
£ qi -analysis minimization. 


It is hard to check the D-NSPq (fo|). The following theorem provides a suffi¬ 
cient condition for exact recovery using the A-RIP. 

Theorem 2 Let (3 £ R d , ||D/3||o = k, and y = X/3. Assume that D £ 
W nxd has full column rank, and its condition number is upper bounded by n < 

hp+i+V^P+f If X £ R mxd satisfies the A-RIP over the set A = {Dv : 
||v||o < ( t q + 1 )fc} with k,t q k £ N, t > 0 ,q £ (0,1], i.e., 


^(t«+i)fc < 


p(l - k 4 ) + ac 2 \J 4p + 1 

p(n 2 + l ) 2 + K 2 


( 6 ) 


with p = t q 2 /4, then f3 is the unique minimizer of the £ q -analysis minimization 
IIP with e = 0. 
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Table 1: Different Sufficient Conditions 


q 

t 

K 

Recovery condition 

1 

1 

1 

52k < # 

1 

2 

1 

1 

82k < ^ 

1 

2 

1 

8 3k < y§ 

1 

2 

4 

1 

S 3 H<y/l 

1 

6 

1 

87k < 

1 

2 

36 

1 

X ^ / 216 

87k < y 217 


This theorem says that although the £ 9 -analysis minimization is a nonconvex 
optimization problem with many local minimums, o ne sti ll can find the g lobal 
optim um under the condition ©. As pointed out by [Blanchard and Thompson 
2009|, the higher-order RIP condition, iust as ©, is easier to be satisfied by a 
larger subset of matrix ensemble such as Gaussian random matrices. Thus, our 
result is meaningful both theoretically and practically. 

It is easy to verify that the right-hand side of the condition © is monotoni- 
cally decreasing with respect to q £ (0,1] when t > 1. Therefore, in terms of the 
A-RIP constant 5(t<i+i)k with order more than 2 k, the condition © is relaxed if 
we use the ^-analysis minimization (q < 1) instead of the £i-analysis minimiza¬ 
tion. A resulted benefit is that the nonconvex f'g-analysis minimization allows 
more sampling matrices to be used than its convex counterpart in compressed 
sensing. Given a p, a larger condition number k will make the condition © 
more restrictive, because the value of the inequality’s right-hand side becomes 
smaller. In other words, an analysis operator with a too large condition number 
could let the £ g -analysis minimization fail to do recovery. This provides hints on 
the evaluation of the analysis operator. For example, it is reasonable to choose 
a tight frame as the analysis operator in some signal processing applications. 
When q tends to zero, the following result is straightforward. 


Corollary 3 Let (3 G R d , y = X/3, and ||D/3|| 0 = k. Assume that 62 k < 
+1)2 +^ +1 with P = f” 2 / 4 . Then there is some small enough q > 0 such 
that the minimizer of the £ q -analysis minimization problem w with e = 0 is 
exactly (3. 


Remark 1 In the case D = I and q = 1 , the c ondition 


is the same as 


the one of Theorem 1.1 in [Cai and Zhand . 2014] which is a sharp condition 
for the basis pursuit problem. Table 13.11 shows several sufficient conditions for 
exact recovery via the £ q -analysis optimization. Compared to previous work, 
our results promote a significant improvement. For example, for the ^i-analysis 
mi nimization, our cond ition 82 k < is weaker than the conditions 82 k < 0-08 


Candes et al. . 201Cll ]. 82 k < 0.2 in Liu et all l2012j ] . 82 k < 0.47 in Lin et al 
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20131 . ($ 2 k < 0.49 in [Li and Liil 12014 ; and our 67 k < 


| is weaker than 


Srk < 0-6 in Candes et al. . 201dl | and Aldroubi et al. . l2012j| . 67 k < 0.687 in 
Lin et all . 2013j | . While for the £ g -analysis minimization (q < 1), the D-RIP 

Aldroubi et al. . 2012| and 62 k < 0.5 [Li and Lirl 


conditions S 7k < %^ 2 /zyh- 


2014] are both stronger than our condition f6|). Note that above results all 


consider the Parseval tight frame case (k = 1). 


3.2 Stable Recovery in Noisy Case 

Now we consider the case that the observation is contaminated with stochastic 
noise (e ^ 0) and the analysis vector D/3* is approximately sparse. This is of 
great interest for many applications. Our goal is to provide estimate error bound 
between the population parameter (3* and the minimizer /3 of the I'g-analysis 
minimization ©• 

Theorem 3 Let (3* G R d , y = X/3* + w, and 11 w| | < e. Assume that D G 
M. nxd has full column rank, and its condition number is upper bounded by n < 
^ 2 P +i+V47+T . If X G M. mxd satisfies the A-RIP over the set A = {Dv : 
||v|| 0 < ( t q + l)fc} with k,t q k G N,t > 0,q G (0,1], i.e., 




< 


p{ 1 - K 4 ) + K 2 y/4p + 1 

p(K 2 + l) 2 + ft 2 


(7) 


with p = 4 1 / 9 2 t q 2 , then the minimizer (3 of the i q -analysis minimization 
problem 0 obeys 


\\Df3 — Df3*\\ q < 2c\k 1 ~ q / 2 e q + 2{2c q 2 + l)a k {T3f3*) 


110-0*112 < 


2ci 


r{D 


■e + 


2 x ! q (2c2 + 1) o~fc(D/3*)q 
Vmin{D) k X / q ~ VS 


<? 

9’ 


where 


c o — (g ~ A*) 2 (l + $(ti+i)k)K 2 - ^(1 — <5(t«+i)fc) 

+ pp 2 (n 2 ( 1 + 5(t«+i)fc) — (1 — ^(t«+i)fc))) 

2k(p — p 2 )^Jl + <5(t<i+i)fc & max (D) 

Ci . 

-Co 

_ 2 pp 2 (n 2 (l + d(t9+i)fc) — (1 — d( t i + i)fc)) 

-Co 

y/~C 0 Pp 2 {K 2 {l + 5( tq + 1 ) k ) ~ (1 - (i(tg + l)fc)) 

-CO 

and p > 0 is a constant depending on p and k. 
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The t 2 error bound shows that the £ g -analysis optimization can stably recover 
the approximately cosparse vector in presence of noise. Again, we can see that a 
too ill-conditioned analysis operator leads to bad performance. Additionally, a 
i q error bound of the difference between (3* and /3 in the analysis domain is pro¬ 
vided, which will be used to show the advantage of the /^-analysis minimization 
in the next subsection. 

The linear model m with Gaussian noise is of parti cular interest in machine 
learning and signal processing. Lemma 1 in Cai et al. . 2009b| shows that the 
noise vector w ~ N( 0, er 2 I) is upper bounded by o\Jm + 2i/mlog = m. The 
following result is thus evident. 

Corollary 4 If the matrix X £ R mxd sa tisfies the A-RIP condition 0 
the noise vector w ~ N(0,cr 2 I), then the minimizer (3 of 0 satisfies 

2ci 


and 


n/3-r 


< 


:(D) 


cry m + 


+ 


2 yjm logm 

2 1 /«(2c 2 + l)<r fc (D/3), 
a min (D) fci/ 9 - 1/2 


with probability at least 1 — 


3.3 Benefits of Nonconvex ^-analysis Minimization 


The advantage of the nonconvex /^-analysis minimization over its convex coun¬ 
terpart is two-fold: the nonconvex approach can do recovery with a lower sample 
complexity and in a wider range of cosparsity. 


2010 | 


The following theorem is a natural extension of Theorem 2.7 of lFourcart et at 
in which D = I. 


Theorem 4 Let to, n, k £ N with to, k < n. Suppose that a matrix X £ R mxd , 
a linear operator D £ M. nxd and a decoder A : R m —> solving y = X/3 satisfy 
for all (3 £ 

||D/3-A(X/3)||« <Ca k {T>f3y q 

with some constant C > 0 and some 0 < q < 1. Then the minimal number of 
samples to obeys 

m > C\qk\og{n/3k) 


with k = ||D/3||o and C\ = l/(21og(2C + 3)). 


Define the decoder A (X/3) := D A 0 (y) with A 0 (y) := argming y=x /3 l|B/3||^. 
Combining with the t q error bound in Theorem^ we attain the following result. 


Corollary 5 To recover the population parameter (3*, the minimal number of 
samples m for the t q -analysis minimization must obey 

m > C2qk\og(n/4k), 

where k = l|D/3*|| 0 and C 2 = l/(21og(8c2 + 7)) (02 is the constant in Theorem 



















Remark 2 In our analysis of the estimate error above, we used the .4-RIP 


over t he set A = {Dv : ||v || 0 < ( t q + 1 )k}, i.e., the D-RIP. As pointed out by 


Candes et al.1 2010| , random matrices with Gaussian, subgaussian, or Bernoulli 


entries satisfy the D-RIP with sample complexity on the order of k\og(n/k). 
It is consistent with Corollary [5] in the case q = 1. However, we see that 
the ^-analysis minimization can have a lower sample complexity than the t\- 
analysis minimization. Additionally, to guarantee the uniqueness of a Z-cosparse 
solution of £o-a n alysis minimization, the minimal number of samples required 
should satisfy the following condition: 


m> 2 • maxdimCWb), 
|A|>; 


( 8 ) 


where W A = NuII(Da). Please refer to iNam et ah 2011 for more details. 


Therefore, the sample complexity of £ 9 -analysis minimization is lower bounded 
by 2 • max| A |>j dim(W A )- 


The condition © guarantees that cosparse vectors can be exactly recovered 
via the £ q -analysis minimization. Define S q (0 < q < 1) as the largest value of 
the sparsity 5 £ N of the analysis vector D/3 such that the condition ([ 6 ]) holds 
for some t q £ -^N. The following theorem indicates the relationship between S q 
with q < 1 and Si with q = 1. 

Theorem 5 Suppose that there exist Si £ N and t £ ^-N such that 


p( 1 - re 4 ) + «V4 p + 1 
(t+1)Sl p(re 2 + l ) 2 + K 2 


with p = \t 1 . Then there exist S q £ N and l q £ -J-N obeying 




t + 1 
- 1 ^ + 1 



(9) 


such that (t + 1)51 = ( l q + 1 )S q and 




p(l - k 4 ) + n 2 ^Ap + 1 
p(K 2 + l ) 2 + K 2 


with p = jl q 2 . 

It can be verified that Theorem [5] also holds for the condition (0. The equation 
© states that the f g -analysis minimization with q < 1 can do recovery in a 
wider range of cosparsity than the ^i-analysis minimization. For example, if 
^ 5 Si < ^ 5^1 then the £ 2 -analysis minimization can recover a vector (5 with 

l|D/9|| 0 = S s = l|S 1 J. 
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4 Iteratively Reweighted Method for ^-analysis 
Minimization 


The iteratively reweighted method is a classical approach to deal with the t„ _ 

norm related optimization problem; see Gorodnitskv and Raol . 19971 Chartrand and Yin , 
120081 iDaubechies et all . [201(1 Lu . 20141. Inspired by them, we develop an itera¬ 
tively reweighted method to solve the £ g -analysis optimization. We reformulate 
|4| as the following unconstrained optimization problem: 


mm 

P 


{|||y-X/3||! + A||D/3|||}. 


( 10 ) 


It is hard to solve mi directly due to the nonsmoothness and nonseparability 
of the i q norm term. We provide a way to deal with the i q norm under the 
variational framework. 

Note that the function ||D/3||| is concave with respect to |D/3|“ = (|Di./3|“,..., |D n ./3| a ) T 
for a > 1. Thus there exists a variational upper bound of ||D/3|||. Given a pos¬ 
itive vector 77 = (771, ..., rj n ) T , we have the following variational formulation, 


iid/ 3||| = xi(iDi./3 n- 

1=1 

n 

= min | J a = - V (r]i |D ; ./3|“ 
T7>o L a z ' V 

i= 1 


q rji “-1 ' > 


for a > 1 and 0 < q < 1. The function J a is jointly convex in (/3,77). Its 
minimum is achieved at r]i = 1/|D;./3|“ -9 , i = 1 ,...,n. However, when /3 is 
orthogonal to some Dj., the weight vector 77 may include infinite components. 
To avoid an infinite weight, we add a smoothing term <7/0: r/i£ a (e > 0) to 

J a . 

Using the above variational formulation, we obtain an approximation of the 
problem in as 


min 

p 


+ 


F(P,s) -min-||y-X/3||! 
tj>o A 


Xq 

a 


^[7 ?l (|D,/3|“+ e “)+ 

i= 1 


a—q 1 

q 


( 11 ) 


We then develop an alternating minimization algorithm, which consists of three 
steps. The first step calculates 77 with P fixed via 




= argmin 

rj>0 


n 

{j2[vi(\^.P (k - 1) \ a +e a )+ 


a—q 1 

g 

q rn J 


which has a closed form solution. The second step calculates P with 77 fixed via 

P (k) = argmin[i||y - X/3||| + — ’Y^r] {k) \Di.P\ ol Y 
PeR d 12 a J 
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which is a weighted ^-minimization problem. Particularly, the case a = 2 
corresponds to a least squares problem which can be solved efficiently. The 
third step updates the smoothing parameter e according to the following rule |f| 

e^ = min{e^ fc_1 ^, p ■ r(D/3 (fc ' ) );} with p G (0,1), 

where r(D/3); is the Z-th smallest element of the set {|Dy/3| : j = 1, ..., n}. (3 is 
a l -cosparse vector if and only if r(Df3)i = 0. The algorithm stops when e = 0. 


Algorithm 1 The CoIRLq Algorithm 
Input: l, X, y, D = [Di.;...;D„.]. 

Init: Choose f3 ^ such that X/3^ = y and = 1. 
while || ( g( fc+1 ) — /3^’^Hoo > r or 0 do 

Update 

r\f '-’ = (|D ■ i .^ k ~ 1 '>\ a + (^^r) i = 1, ..., n. 

Update 

(3^ = argmin{Vy - Xf3\\\ + ^ vf‘ '| Di ./3|“} 

(3ev. d 12 a J 

Update 

gUd = min{e^ fe_1 \ p ■ r(D/3 (A d)j} with p G (0,1). 

end while 
Output: (3 


4.1 Convergence Analysis 

Our analysis is based on the optimization problem m with the objective func¬ 
tion F(/3,e). Noting that ris a function of (3 l ' k ' > and e^ k \ we define the 
following objective function 


Q(/3, e |/3( fc ), e ( fe )) A l|| y _ X(3 ||2 


\ n 

+ Ag^r ( fc+ i} + 

a f—' L 

2=1 


a — q 1 
Q J k + 1 )^-' 

'li 


Lemma 1 Assume that the analysis operator D has full column rank. Let 
{(/3 (fc) ,eW) : k = 1,2,...} be a sequence generated by the CoIRLq algorithm. 
Then, 

3 Various strategies can be applied to update e. For example, we c an keep e as a small fixed 
value. It is preferred to choose a sequence of tending to zero [Daubechies et afll2Q10|[ . 
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and 


F{f3 ( ' k+ 1 ) ,e (k+1) ) < F(/3 {fe) ,£ (fc) ), 

with equality holding if and only if 1 3 < ' k+1 ' ) = and e^ k+1 l = eW. 

The boundedness of ||/ 3^||2 implies that the sequence {/3^} converges to 
some accumulation point. We can immediately derive the convergence property 
of the CoIRLq algorithm from Zangwill’s glo bal convergence theorem or the lit¬ 
erature Sriperumbudur and Lanckrietl |2009| . Here we omit the detail. Finally, 


it is easy to verify that when e* = 0, (3* is a stationary point of m- 


4.2 Recovery Guarantee Analysis 

To uniquely recover the true parameter, the linear operator X : A —> K m 
must be a one-to-one map. D efine a set A = {(3 = (3 X + (3 2 : /3 l ,/3 2 € A}. 
Blumensath and Davied 2008 1 showed that a necessary condition for the ex¬ 
istence of a one-to-one map requires that 5^ < 1 ( 6 a < Sj). For any two 
Z-cosparse vectors f3 1 ,/3 2 £ A = {(3 : Da/3 = 0, |A| > Z}, denote Ti = 
supp(D/3 1 ), T 2 = supp(D/3 2 ), Ai = cosuppfD/3^ and A 2 = cosupp(D/3 2 ). 
Since supp(D(/3 1 + /3 2 )) C T\ UT 2 , we have cosupp(D(/3 1 + (3 2 )) D (Ti UT 2 ) C = 
Tf fl T 2 = Ai fl A 2 . Moreover, we also have |Ai D A 2 | = n — \T\ U T 2 \ > 
n — (n — /) — (n — l) = 21 — n. Thus it requires that the linear operator X 
satisfies the M-RIP with S 2 i- n < 1 to uniquely recover any Z-cosparse vector 
from the set A = {(3 : Da/3 = 0, |A| > /}. Otherwi se, there wo ul d exis t two /- 
cosparse vectors (3 X ^ (3 2 such that X(/3 1 — (3 2 ) = 0. iGiryes et al. 2013J showed 
that there exists a random matrix X satisfying such a requirement with high 
probability. 

Theorem 6 Let (3* £ be a l-cosparse vector, and y = X/3*+w with 11w|| 2 < 
e. Assume that X satisfies the A-RIP over the set A = {/3 : Da/3 = 0 , |A| > /} 
of order 21 — n with S 2 i- n < 1. Then the solution (3 obtained by the CoIRLq 
algorithm obeys 

\\P-f3*\\2<C 1 Vx + C 2 e, 

where C± and C 2 are constants depending on S 2 i- n . 

We can see that the CoIRLq algorithm can recover an approximate solution 
away from the true parameter vector by a factor of y/X in the noiseless case. 


5 Numerical Analysis 

In this section we conduct numerical analysis of the £ g -analysis minimization 
method on both simulated data and real data, and compare the performance of 
the case q < 1 and the case q = 1. We set a = 2 in the CoIRLq algorithm. 
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5.1 Cosparse Vector Recovery 

We generate the simulated datasets according to 

y = X/3 + w, 

where w ~ iV(0,crJ). The sampling matrix X is drawn independently from 
the normal distribution with normalized columns. The analysis operator D 
is constructed such that is a random tight frame. To generate a Z-cosparse 
vector /3, we first choose Z rows randomly from D and form Da-T hen we generate 
a vector which lies in the null space of Da- The recovery is deemed to be 
successful if the recovery relative error ||/3 — /3*|I2/I|/3*112 < le — 4. 

In the first experiment, we test the vector recovery capability of the CoIRLq 
method with q = 0.7. We set m = 80, n = 144, d = 120, Z = 99, and cr = 
0. Figure 1 illustrates that the CoIRLq method recovers the original vector 
perfectly. 



Figure 1: Cosparse vector recovery. 


In the second experiment, we test the CoIRLq method on a range of sample 
size and cosparsity with different q. Although the optimal tuning parameter A 
depends on q , a small enough A is able to ensure that y approximately equals 
to X/3 in the noiseless case. Thus, we set A = le — 4 for all q and a = 0. Figure 
2 reports the result with 100 repetitions on every dataset. We can see that the 
CoIRLq method with q = 0.5,0.7,0.8 can achieve exact recovery in a wider range 
of cosparsity and with fewer samples than with q = 1. In addition, it should 
be noted that small g = 0.1 or(/ = 0.3do not perform better than relatively 
large q = 0.7,0.8, because a too small q leads to a hard-solving problem. Note 
that there is a drop of recovery probability where the cosparsity Z = 118 0- This 
is because it is hard to algorithmically reco ver a vector resid ing in a subspace 
with a small dimension; please also refer to Nam et all . 120111. 

In the third experiment, we compare the CoIRLq method with three state- 
of-the-art methods for the ^-analysis minimization problem including NESTA 


4 When l = 120, a zero vector is generated by our codes. So the recovery probability in 
cosparsity l = 120 is zero. 
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Figure 2: Exact recovery probability of the CoIRLq method. 




Figure 3: Recovery probability of the CoIRLq, NESTA, IRL1 and split Bregman 
methods. 


(http://statweb.stanford.edu/~candes/nesta/), split Bregman method, and it¬ 
eratively reweighted l\ (IRL1) method. Set the noise level a = 0.01. The 
parameter A is tuned via the grid search method. We run these methods in a 
range of sample size and cosparsity. Figure 3 reports the result with 100 repeti¬ 
tions on every dataset. We can see that the nonconvex f g -analysis minimization 
with q < 1 is more capable of achieving exact recovery against noise than the 
convex £i-analysis minimization. Moreover, the nonconvex approach can obtain 
exact recovery with fewer samples or in a wider range of cosparsity than the 
convex counterpart. Moreover, we found that the CoIRLq algorithm in the case 
q < 1 often needs less iterations than in the case q = 1. 

5.2 Image Restoration Experiment 

In this section we demonstrate the effectiveness of the £ g -analysis minimization 
on the Shepp Logan phantom reconstruction problem. In computed tomogra¬ 
phy, an image can not be observed directly. Instead, we can only obtain its 2D 
Fourier transform coefficients along a few radial lines due to certain limitations. 
This sampling process can be modeled as a measurement matrix X. The goal 
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(e) 15 lines (f) SNR=30.5 


(g) SNR=45.7 (h) SNR=43 


Figure 4: (a) Original Shepp Logan Phantom image; (b) Sampling locations 
along 10 radial lines; (c) Exact reconstruction via CoIRLq (q=0.7) with 10 lines 
without noise; (d) Exact reconstruction via CoIRLq (q=l) with 12 lines without 
noise; (e) Sampling locations along 15 radial lines; (f) Reconstruction via GAP 
with 15 lines and noise level <r = 0.01; (g) Reconstruction via CoIRLq (q=0.7) 
with 15 lines and noise level a = 0.01; (h) Reconstruction via CoIRLq (q=l) 
with 15 lines and noise level a = 0.01. 


is to reconstruct the image from the observation. 

The experimental program is set as follows. The image dimension is of 
256 x 256, namely d, = 65536. The measurement matrix X is a two dimensional 
Fourier transform which measures the image’s Fourier transform along a few 
radial lines. The analysis operator is a finite difference operator D 2 D-DIF whose 
size is roughly twice the image size, namely n = 130560. Since the number of 
nonzero analysis coefficients is n—l = 2546, the cosparsity used is l = n— 2546 = 
128014. The number of measurements depends on the number of radial lines 
used. To show the reconstruction capability of the CoIRLq method, we conduct 
the following experiments (the parameter A is tuned via grid search). First, we 
compare our method with the greedy analysis pursuit (GAP http://www.small- 
project.eu/software-data) method for the ^o-analysis minimization. 

Figures HJ- (f), (g) and (h) show that our method performs better than the 
GAP method in the noisy case. We can see that the CoIRLq method with q < 1 
is more robust to noise than the case with q = 1. Second, we take an experiment 
using 10 radial lines without noise. The corresponding number of measurements 
is to = 2282, which is approximately 3.48% of the image size. Figure QJ-(c) 
demonstrates that the CoIRLq {q = 0.7) method with 10 lines obtains perfect 
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reconstruction. Figure SJ-(d) shows that the CoIRLq (q = 1) method with 12 
lines attains perfect reconstruction. However,_the_GAP method needs at least 
12 radial lines to achieve exact recovery; see Nam et al IEm|- 


6 Conclusion 

In this paper we have conducted the theoretical analysis and developed the com¬ 
putational method, for the ^-analysis minimization problem. Theoretically, we 
have established weaker conditions for exact recover in noiseless case and a 
tighter non-asymptotic upper bound of estimate error in noisy case. In partic¬ 
ular, we have presented a necessary and sufficient condition guaranteeing exact 
recovery. Additionally, we have shown that the nonconvex £ g -analysis optimiza¬ 
tion can do recovery with a lower sample complexity and in a wider range of 
cosparsity. Computationally, we have devised an iteratively reweighted method 
to solve the t^-analysis optimization problem. Empirical results have illustrated 
that our iteratively reweighted method outperforms the state-of-the-art meth¬ 
ods. 
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